Graph-Based Extractive Text Summarization Sentence Scoring Scheme for Big Data Applications

نویسندگان

چکیده

The recent advancements in big data and natural language processing (NLP) have necessitated proficient text mining (TM) schemes that can interpret analyze voluminous textual data. Text summarization (TS) acts as an essential pillar within recommendation engines. Despite the prevalent use of abstractive techniques TS, anticipated shift towards a graph-based extractive TS (ETS) scheme is becoming apparent. models, although simpler less resource-intensive, are key assessing reviews feedback on products or services. Nonetheless, current methodologies not fully resolved concerns surrounding complexity, adaptability, computational demands. Thus, we propose our scheme, GETS, utilizing model to forge connections among words sentences through statistical procedures. structure encompasses post-processing stage includes sentence clustering. Employing Apache Spark framework, designed for parallel execution, making it adaptable real-world applications. For evaluation, selected 500 documents from WikiHow Opinosis datasets, categorized them into five classes, applied recall-oriented understudying gisting evaluation (ROUGE) parameters comparison with measures ROUGE-1, 2, L. results include recall scores 0.3942, 0.0952, 0.3436 L, respectively (when using clustered approach). Through juxtaposition existing models such BERTEXT (with 3-gram, 4-gram) MATCHSUM, has demonstrated notable improvements, substantiating its applicability effectiveness scenarios.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Assessing sentence scoring techniques for extractive text summarization

0957-4174/$ see front matter 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.eswa.2013.04.023 ⇑ Corresponding author. Tel.: +55 8197885665. E-mail addresses: [email protected] (Rafael Ferreira), [email protected] (L. de Souza Cabral), [email protected] (R.D. Lins), [email protected] (G. Pereira e Silva), [email protected] (F. Freitas), [email protected] (G.D.C. Cavalcanti), rjl...

متن کامل

Biogeography-Based Optimization Algorithm for Automatic Extractive Text Summarization

    Given the increasing number of documents, sites, online sources, and the users’ desire to quickly access information, automatic textual summarization has caught the attention of many researchers in this field. Researchers have presented different methods for text summarization as well as a useful summary of those texts including relevant document sentences. This study select...

متن کامل

Extractive Based Automatic Text Summarization

Automatic text summarization is the process of reducing the text content and retaining the important points of the document. Generally, there are two approaches for automatic text summarization: Extractive and Abstractive. The process of extractive based text summarization can be divided into two phases: pre-processing and processing. In this paper, we discuss some of the extractive based text ...

متن کامل

A new sentence similarity measure and sentence based extractive technique for automatic text summarization

The technology of automatic document summarization is maturing and may provide a solution to the information overload problem. Nowadays, document summarization plays an important role in information retrieval. With a large volume of documents, presenting the user with a summary of each document greatly facilitates the task of finding the desired documents. Document summarization is a process of...

متن کامل

Topical Coherence for Graph-based Extractive Summarization

We present an approach for extractive single-document summarization. Our approach is based on a weighted graphical representation of documents obtained by topic modeling. We optimize importance, coherence and non-redundancy simultaneously using ILP. We compare ROUGE scores of our system with state-of-the-art results on scientific articles from PLOS Medicine and on DUC 2002 data. Human judges ev...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Information

سال: 2023

ISSN: ['2078-2489']

DOI: https://doi.org/10.3390/info14090472